38 research outputs found

    Non-parametric Bayesian modelling of digital gene expression data

    Full text link
    Next-generation sequencing technologies provide a revolutionary tool for generating gene expression data. Starting with a fixed RNA sample, they construct a library of millions of differentially abundant short sequence tags or "reads", which constitute a fundamentally discrete measure of the level of gene expression. A common limitation in experiments using these technologies is the low number or even absence of biological replicates, which complicates the statistical analysis of digital gene expression data. Analysis of this type of data has often been based on modified tests originally devised for analysing microarrays; both these and even de novo methods for the analysis of RNA-seq data are plagued by the common problem of low replication. We propose a novel, non-parametric Bayesian approach for the analysis of digital gene expression data. We begin with a hierarchical model for modelling over-dispersed count data and a blocked Gibbs sampling algorithm for inferring the posterior distribution of model parameters conditional on these counts. The algorithm compensates for the problem of low numbers of biological replicates by clustering together genes with tag counts that are likely sampled from a common distribution and using this augmented sample for estimating the parameters of this distribution. The number of clusters is not decided a priori, but it is inferred along with the remaining model parameters. We demonstrate the ability of this approach to model biological data with high fidelity by applying the algorithm on a public dataset obtained from cancerous and non-cancerous neural tissues

    Structural and non-coding variants increase the diagnostic yield of clinical whole genome sequencing for rare diseases

    Get PDF
    BACKGROUND: Whole genome sequencing is increasingly being used for the diagnosis of patients with rare diseases. However, the diagnostic yields of many studies, particularly those conducted in a healthcare setting, are often disappointingly low, at 25-30%. This is in part because although entire genomes are sequenced, analysis is often confined to in silico gene panels or coding regions of the genome.METHODS: We undertook WGS on a cohort of 122 unrelated rare disease patients and their relatives (300 genomes) who had been pre-screened by gene panels or arrays. Patients were recruited from a broad spectrum of clinical specialties. We applied a bioinformatics pipeline that would allow comprehensive analysis of all variant types. We combined established bioinformatics tools for phenotypic and genomic analysis with our novel algorithms (SVRare, ALTSPLICE and GREEN-DB) to detect and annotate structural, splice site and non-coding variants.RESULTS: Our diagnostic yield was 43/122 cases (35%), although 47/122 cases (39%) were considered solved when considering novel candidate genes with supporting functional data into account. Structural, splice site and deep intronic variants contributed to 20/47 (43%) of our solved cases. Five genes that are novel, or were novel at the time of discovery, were identified, whilst a further three genes are putative novel disease genes with evidence of causality. We identified variants of uncertain significance in a further fourteen candidate genes. The phenotypic spectrum associated with RMND1 was expanded to include polymicrogyria. Two patients with secondary findings in FBN1 and KCNQ1 were confirmed to have previously unidentified Marfan and long QT syndromes, respectively, and were referred for further clinical interventions. Clinical diagnoses were changed in six patients and treatment adjustments made for eight individuals, which for five patients was considered life-saving.CONCLUSIONS: Genome sequencing is increasingly being considered as a first-line genetic test in routine clinical settings and can make a substantial contribution to rapidly identifying a causal aetiology for many patients, shortening their diagnostic odyssey. We have demonstrated that structural, splice site and intronic variants make a significant contribution to diagnostic yield and that comprehensive analysis of the entire genome is essential to maximise the value of clinical genome sequencing.</p

    Whole-genome sequencing of chronic lymphocytic leukemia identifies subgroups with distinct biological and clinical features

    Get PDF
    The value of genome-wide over targeted driver analyses for predicting clinical outcomes of cancer patients is debated. Here, we report the whole-genome sequencing of 485 chronic lymphocytic leukemia patients enrolled in clinical trials as part of the United Kingdom's 100,000 Genomes Project. We identify an extended catalog of recurrent coding and noncoding genetic mutations that represents a source for future studies and provide the most complete high-resolution map of structural variants, copy number changes and global genome features including telomere length, mutational signatures and genomic complexity. We demonstrate the relationship of these features with clinical outcome and show that integration of 186 distinct recurrent genomic alterations defines five genomic subgroups that associate with response to therapy, refining conventional outcome prediction. While requiring independent validation, our findings highlight the potential of whole-genome sequencing to inform future risk stratification in chronic lymphocytic leukemia

    Consistency management with repair actions

    Get PDF
    Comprehensive consistency management requires a strong mechanism for repair once inconsistencies have been detected. In this paper we present a repair framework for inconsistent distributed documents. The core piece of the framework is a new method for generating interactive repairs from full first order logic formulae that constrain these documents. We present a full implementation of the components in our repair framework, as well as their application to the UML and related heterogeneous documents such as EJB deployment descriptors. We describe how our approach can be used as an infrastructure for building higher-level, domain specific frameworks and provide an overview of related work in the database and software development environment community

    Simultaneous estimation of hidden model states (including intracellular calcium concentrations) and maximal conductances in a two-compartment model of a vertebrate motoneuron (II).

    No full text
    <p>Inference of maximal conductances and noise parameters during fixed-lag smoothing. (<b>A</b>) The standard deviations of the observation (Ai) and the intrinsic (Aii) noise at the soma and the dendrite. (<b>B</b>) Inferred maximal conductances of the sodium and potassium currents at the soma (Bi), of the N-type calcium current and the calcium-activated potassium current at the soma (Bii), of the calcium-activated potassium current at the dendrite (Biii) and of the N-type and L-type calcium currents at the dendrite (Biv). In all cases, parameter expectations gradually converged towards the true parameter values (dashed lines) after less than . The grey lines in Aii, Biii and Biv correspond to estimated parameters, when current was injected in the soma only. In these simulations, , , and the prior interval for was .</p

    True and estimated values and prior intervals used during smoothing for all parameters in the two-compartment conductance-based model.

    No full text
    1<p>These parameter values were estimated when we used the broad prior intervals (see <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002401#pcbi-1002401-g011" target="_blank">Fig. 11Ai</a>).</p>2<p>Values in bold indicate the narrow prior intervals we used for generating <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002401#pcbi-1002401-g011" target="_blank">Figs. 11Aii, 11B, 11C</a> (and Supplementary <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002401#pcbi.1002401.s004" target="_blank">Figs. S4</a> and <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1002401#pcbi.1002401.s005" target="_blank">S5</a>).</p
    corecore